Creating features with Animal Crossing data

DSST 289: Introduction to Data Science

Erik Fredner

2024-09-18

Overview

  • Exam reminders
  • 🧶 Knitting follow-up
  • 🐟 Animal Crossing data
  • mutate()
  • if_else()
  • case_when()

Exam reminders 1

  • Exam 01 take-home will be available today at 2pm
    • You will receive an automated email announcing availability
    • It will be in Blackboard > Course Documents > Exams > Exam 01
  • Take-home is due by the beginning of our class Wednesday
    • You may spend as much time as you want on it
    • I expect it will take 60-90 minutes

Exam reminders 2

  • You may reference notes, notebooks, and slides on the take-home
    • Material from today is on the take-home
  • In class portion of the exam on Wednesday
    • Not allowed to use anything other than pen and paper
    • In class question types:
      • recreate graph
      • recreate table
      • fill in blanks in code
  • Questions?

🧶 Knitting follow-up

  • Turns out Blackboard does not permit students to upload .html files
  • Two alternatives:
    • Knit to .pdf
    • .zip the .html

Knit to .pdf

Knit to .pdf option

Knit to .pdf installation

If you need to install TinyTeX

tinytex::install_tinytex()

.zip the .html (Mac)

If knitting to .pdf does not work, compress your .html file like so:

Compress .html file (Mac)

.zip the .html (Windows)

Instructions for Windows here.

Compress files (Windows)

🎣 = 💰

What is the best way to make money fishing?

Fishing in Animal Crossing

Animal Crossing fish data

name value location spawn_low spawn_high
clown fish 650 Sea 5 6
ray 3000 Sea 2 2
sea butterfly 1000 Sea 10 11
golden trout 15000 River 1 1
saddled bichir 4000 River 1 1

mutateing spawn rates

  • Spawn rates tell us how likely we are to see a fish.
  • Higher values are more common than low values.
fish |>
  mutate(spawn_rate = (spawn_low + spawn_high) / 2) |>
  select(name, spawn_low, spawn_high, spawn_rate) |>
  arrange(desc(spawn_rate)) |>
  slice_head(n = 5) |> 
  kable()

mutateing spawn rates

name spawn_low spawn_high spawn_rate
salmon 20 20 20.0
pond smelt 18 20 19.0
horse mackerel 14 21 17.5
bitterling 12 17 14.5
sea bass 11 18 14.5

mutate with if_else

fish |>
  mutate(
    spawn_rate = (spawn_low + spawn_high) / 2,
    spawn_freq = if_else(spawn_rate > 10, "common", "rare")
  ) |>
  select(name, spawn_rate, spawn_freq) |>
  arrange(desc(spawn_rate)) |>
  slice_sample(n = 2) |> 
  kable()
name spawn_rate spawn_freq
frog 8 rare
black bass 8 rare

Is common vs. rare enough?

fish |>
  mutate(
    spawn_rate = (spawn_low + spawn_high) / 2,
    spawn_freq = if_else(spawn_rate > 10, "common", "rare")
  ) |>
  ggplot(aes(x = spawn_rate, y = value, color = spawn_freq)) +
  geom_point() +
  scale_color_viridis_d()

Is common vs. rare enough?

Common vs. rare probably isn’t enough

fish |>
  mutate(
    spawn_rate = (spawn_low + spawn_high) / 2,
    spawn_freq = if_else(spawn_rate > 10, "common", "rare")
  ) |>
  count(spawn_freq) |> 
  kable()
spawn_freq n
common 8
rare 72

mutate + case_when

A useful combination to create new variables based on multiple conditions:

fish <- fish |>
  mutate(
    spawn_rate = (spawn_low + spawn_high) / 2,
    spawn_freq = case_when(
      spawn_rate > 15 ~ "very common",
      spawn_rate > 10 ~ "common",
      spawn_rate > 5 ~ "uncommon",
      spawn_rate > 2 ~ "rare",
      spawn_rate <= 2 ~ "very rare",
      TRUE ~ "default"
    )
  )

What that data looks like

fish |>
  select(name, spawn_rate, spawn_freq) |>
  slice_sample(n = 5) |>
  arrange(spawn_rate) |>
  kable()
name spawn_rate spawn_freq
arapaima 1.0 very rare
pop-eyed goldfish 1.5 very rare
sea horse 6.0 uncommon
tilapia 8.0 uncommon
sea bass 14.5 common

Is the distribution better?

fish |> 
  count(spawn_freq) |> 
  kable()
spawn_freq n
very rare 31
rare 20
uncommon 21
common 5
very common 3

Which fish are relatively valuable and relatively common?

fish |>
  ggplot(aes(x = spawn_rate, y = value, color = spawn_freq)) +
  geom_point() +
  scale_color_viridis_d()

Which fish are relatively valuable and relatively common?

Where should we fish?

fish |>
  select(name, location, location_condition) |>
  filter(!is.na(location_condition)) |>
  slice_sample(n = 3) |>
  kable()
name location location_condition
sturgeon River mouth
king salmon River mouth
cherry salmon River clifftop

Where are the above-average fish?

fish |>
  filter(value > mean(value)) |>
  count(location, location_condition) |>
  arrange(desc(n)) |>
  kable()
location location_condition n
Sea NA 10
River NA 6
Pier NA 4
Pond NA 4
River clifftop 3
River mouth 1
Sea rainy days 1

Mean and median value by location

fish |>
  group_by(location) |>
  summarize(median_value = median(value),
            mean_value = mean(value),
            min_value = min(value),
            max_value = max(value)) |>
  arrange(desc(mean_value)) |>
  kable()
location median_value mean_value min_value max_value
Pier 6500 6875.000 4500 10000
Sea 1750 4381.667 150 15000
River 1400 3418.529 160 15000
Pond 1050 2035.000 100 6000

Location, location, location

fish |>
  filter(value > mean(value)) |>
  ggplot(aes(x = spawn_rate, y = value, color = location)) +
  geom_point() +
  geom_label_repel(aes(label = name)) +
  scale_color_viridis_d()

Location, location, location

Will we make more money if we catch less valuable fish more often?

From spawn rate to spawn probability

fish |>
  mutate(spawn_prob = spawn_rate / 100) |>
  select(name, location, spawn_rate, spawn_prob) |>
  slice_sample(n = 5) |>
  kable()
name location spawn_rate spawn_prob
mahi-mahi Pier 1.0 0.010
ranchu goldfish Pond 1.5 0.015
frog Pond 8.0 0.080
crawfish Pond 8.0 0.080
arapaima River 1.0 0.010

Sneak preview: Fishing simulation

Catch 5 fish in each location 6 times

Catch 50 fish 6 times

Catch 5,000 fish 6 times

🏆 MVF (Most Valuable Fish)

caught_fish total_caught total_value
great white shark 298 $4,470,000
blowfish 690 $3,450,000
sturgeon 322 $3,220,000
dorado 212 $3,180,000
coelacanth 191 $2,865,000
barred knifejaw 544 $2,720,000
red snapper 790 $2,370,000
stringfish 147 $2,205,000
soft-shelled turtle 556 $2,085,000
arowana 208 $2,080,000
football fish 783 $1,957,500
barreleye 129 $1,935,000
golden trout 123 $1,845,000
whale shark 141 $1,833,000
salmon 2617 $1,831,900
bitterling 1871 $1,683,900
snapping turtle 320 $1,600,000
saw shark 132 $1,584,000
arapaima 144 $1,440,000
Napoleonfish 143 $1,430,000
sea butterfly 1397 $1,397,000
koi 345 $1,380,000
angelfish 452 $1,356,000
giant snakehead 243 $1,336,500
oarfish 137 $1,233,000
char 312 $1,185,600
gar 197 $1,182,000
king salmon 638 $1,148,400
mitten crab 544 $1,088,000
hammerhead shark 128 $1,024,000
pond smelt 2543 $1,017,200
sweetfish 1000 $900,000
sea horse 801 $881,100
betta 344 $860,000
ranchu goldfish 184 $828,000
tilapia 996 $796,800
cherry salmon 791 $791,000
sea bass 1956 $782,400
ray 260 $780,000
loach 1839 $735,600
catfish 870 $696,000
ocean sunfish 164 $656,000
piranha 260 $650,000
butterfly fish 580 $580,000
squid 1126 $563,000
saddled bichir 135 $540,000
olive flounder 671 $536,800
moray eel 267 $534,000
dab 1692 $507,600
zebra turkeyfish 938 $469,000
guppy 347 $451,100
clown fish 681 $442,650
goldfish 332 $431,600
suckerfish 274 $411,000
black bass 1024 $409,600
nibble fish 272 $408,000
horse mackerel 2306 $345,900
yellow perch 1137 $341,100
pike 185 $333,000
rainbowfish 333 $266,400
surgeonfish 257 $257,000
pop-eyed goldfish 190 $247,000
puffer fish 965 $241,250
carp 801 $240,300
crawfish 1075 $215,000
dace 841 $201,840
pale chub 983 $196,600
bluegill 1083 $194,940
ribbon eel 300 $180,000
freshwater goby 438 $175,200
crucian carp 1065 $170,400
neon tetra 294 $147,000
killifish 433 $129,900
frog 1020 $122,400
anchovy 456 $91,200
tadpole 737 $73,700